Repeat variable diresidues for targeting nucleotides

ABSTRACT

The present invention relates to polypeptides and more particularly to Transcription Activator-Like Effector derived proteins that allow to efficiently target and/or process nucleic acids. The present invention also concerns methods to use these proteins. The present invention also relates to vectors, compositions and kits in which RVD domains and Transcription Activator-Like Effector (TALE) proteins of the present invention are used.

FIELD OF THE INVENTION

The present invention relates to polypeptides and more particularly toTranscription Activator-Like Effector derived proteins that allow toefficiently target and/or process nucleic acids. The present inventionalso concerns methods to use these proteins. The present invention alsorelates to vectors, compositions and kits in which Repeat VariableDiresidue (RVD) domains and Transcription Activator-Like Effector (TALE)proteins of the present invention are used.

BACKGROUND OF THE INVENTION

The DNA binding domain of a recently discovered new class of proteinderived from Transcription Activator-Like Effectors (TALE), has beenwidely used for several applications in the field of genome engineering.The sequence specificity of this family of proteins used in theinfection process by plant pathogens of the Xanthomonas genus is drivenby an array of motifs of 33-35 amino acids repeats, differingessentially by the two positions 12 and 13 (Boch, Scholze et al. 2009;Moscou and Bogdanove 2009). The recent achievement of the highresolution structure of TAL effectors bound to DNA showed that eachsingle base of the same strand in the DNA target is contacted by asingle repeat (Deng, Yan et al. 2012; Mak, Bradley et al. 2012), withthe specificity resulting from the two polymorphic amino acids of therepeat; the so-called RVDs (Repeat Variable Diresidue). The modularityof these DNA binding domains has been confirmed to a certain extent byassembly of designed TALE-derived protein with new specificities.

TAL effectors fused to a nuclease catalytic head (TALE-nuclease) tocreate new tools, especially for genome engineering applications havebeen shown to be active to various extents in cell-based assays inyeast, mammalian cells and plants (Christian, Cermak et al. 2010;Cermak, Doyle et al. 2011; Geissler, Scholze et al. 2011; Huang, Xiao etal. 2011; Li, Huang et al. 2011; Mahfouz, Li et al. 2011; Miller, Tan etal. 2011; Morbitzer, Elsaesser et al. 2011; Mussolino, Morbitzer et al.2011; Sander, Cade et al. 2011; Tesson, Usal et al. 2011; Weber,Gruetzner et al. 2011; Zhang, Cong et al. 2011; Li, Piatek et al. 2012;Mahfouz, Li et al. 2012).

Despite the description in the literature of a dozen of natural RVDs andtheir predicted partner bases, researchers are mainly focusing on usingfour different RVD/base couples NI/A, HD/C, NN/G, and NG/T [(Huang, Xiaoet al. 2011; Mahfouz, Li et al. 2011; Morbitzer, Elsaesser et al. 2011;Mussolino, Morbitzer et al. 2011; Mahfouz, Li et al. 2012; Mak, Bradleyet al. 2012)]. In a previous study, the DNA binding specificity ofalternative RVDs which target the base at the 6^(th) position have beentested (WO 2011/146121).

Moreover, up to now, researchers have only published successful use ofTALE-nucleases without reporting how frequently a TALE-nuclease fails towork. The designs of these arrays still only relay on the published code(Boch, Scholze et al. 2009; Moscou and Bogdanove 2009) and in fact leadto a certain amount of inactive or weakly active molecules. Thereremains a need for designing new RVDs obeying to an improved code,allowing governing TALE/DNA interactions with high specificity and/orflexibility.

Here, the inventors have made the conjecture that new RVDs could replaceexisting ones by testing their binding to nucleotide bases at the firstto the fourth positions of a TALE recognition domain and that thisreplacement could improve the overall specificity TALE nucleic acidrecognition. By proceeding accordingly, the inventors identified a setof new RVDs with useful activity and specificity.

BRIEF SUMMARY OF THE INVENTION

In a general aspect, the present invention relates to polypeptides thatallow to efficiently target and/or process nucleic acids. Moreparticularly the present invention relates to TranscriptionActivator-Like Effector derived proteins and particularly to repeatsequences comprising highly specific Repeat Variable-Diresidue (RVD)that allow to efficiently target and process nucleic acids. The presentinvention also concerns methods to use these RVDs and TranscriptionActivator-Like Effector proteins or chimeric proteins comprising theserepeat sequences with RVDs. The present invention also relates tovectors, compositions and kits in which RVDs and TranscriptionActivator-Like Effector proteins of the present invention are used.

BRIEF DESCRIPTION OF THE FIGURES AND TABLES

In addition to the preceding features, the invention further comprisesother features which will emerge from the description which follows, aswell as to the appended drawings. A more complete appreciation of theinvention and many of the attendant advantages thereof will be readilyobtained as the same becomes better understood by reference to thefollowing Figures in conjunction with the detailed description below.

FIG. 1 : Schematic representation of the solid support method forsynthesizing RVDs arrays used to prepare the libraries 1 to 8.

FIG. 2 : Schematic representation of the solid support method forsynthesizing RVDs arrays used to prepare the libraries A, B, C and D.

FIG. 3 : a-c: TALE-Nuclease cleavage activity levels of individualclones of the library A on their respective targets (SEQ ID NO: 94 toSEQ ID NO: 97) containing A, C, G or T at the position 1 of the TALEarray in our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006). Values are comprised between 0 and 1. Maximal value is 1.

FIG. 4 : a-c: TALE-Nuclease cleavage activity levels of individualclones of the library B on their respective targets (SEQ ID NO: 98 toSEQ ID NO: 101) containing A, C, G or T at the position 2 of the TALEarray in our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006). Values are comprised between 0 and 1. Maximal value is 1.

FIG. 5 : a-d: TALE-Nuclease cleavage activity levels of individualclones of the library C on their respective targets (SEQ ID NO: 102 toSEQ ID NO: 105) containing A, C, G or T at the position 3 of the TALEarray our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006). Values are comprised between 0 and 1. Maximal value is 1.

FIG. 6 : a-c: TALE-Nuclease cleavage activity levels of individualclones of the library D on their respective targets (SEQ ID NO: 106 toSEQ ID NO: 109) containing A, C, G or T at the position 4 of the TALEarray our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006). Values are comprised between 0 and 1. Maximal value is 1.

Table 1: List of oligonucleotides (5′→3′) used to introduce diversity inpositions 12 and 13 in libraries of a HD bloc in example 1.

Table 2: Target collections for libraries screening in example 1.

Table 3: Mean activities of three clones with one RVD randomized on aseries of targets (SEQ ID NO: 62-77) in our yeast SSA assay previouslydescribed (International PCT Applications WO 2004/067736 and in (Epinat,Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al.2006; Smith, Grizot et al. 2006) at 30° C. − indicates no detectableactivity, + indicates low activity, ++ medium activity and +++ highactivity.

Table 4: List of oligonucleotides (5′→3′) used to introduce diversity inposition 12 and 13 of a NG bloc in example 2.

Table 5: List of pseudo-palindromic sequences targets (two identicalrecognition sequences are placed facing each other on both DNA strands)in our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006) at 30° C., used for activity screens in yeast of libraries A, B, Cand D.

Table 6: List of heterodimeric sequences targets (two differentrecognition sequences are placed facing each other on both DNA strands)in our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006) at 37° C., used for activity screens in yeast of NM/LP and SD/VGcontaining half-TALE-Nuclease.

Table 7: Activities of the three TALE-Nuclease pairs on heterodimericsequence target A and B (two identical recognition sequences are placedfacing each other on both DNA strands) in our yeast SSA assay previouslydescribed (International PCT Applications WO 2004/067736 and in (Epinat,Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al.2006; Smith, Grizot et al. 2006) at 30° C. ++ indicates medium activityand +++ high activity.

DETAILED DESCRIPTION OF THE INVENTION

Unless specifically defined herein, all technical and scientific termsused have the same meaning as commonly understood by a skilled artisanin the fields of gene therapy, biochemistry, genetics, and molecularbiology.

All methods and materials similar or equivalent to those describedherein can be used in the practice or testing of the present invention,with suitable methods and materials being described herein. Allpublications, patent applications, patents, and other referencesmentioned herein are incorporated by reference in their entirety. Incase of conflict, the present specification, including definitions, willprevail. Further, the materials, methods, and examples are illustrativeonly and are not intended to be limiting, unless otherwise specified.

The practice of the present invention will employ, unless otherwiseindicated, conventional techniques of cell biology, cell culture,molecular biology, transgenic biology, microbiology, recombinant DNA,and immunology, which are within the skill of the art. Such techniquesare explained fully in the literature. See, for example, CurrentProtocols in Molecular Biology (Frederick M. AUSUBEL, 2000, Wiley andson Inc, Library of Congress, USA); Molecular Cloning: A LaboratoryManual, Third Edition, (Sambrook et al, 2001, Cold Spring Harbor, N.Y.:Cold Spring Harbor Laboratory Press); Oligonucleotide Synthesis (M. J.Gait ed., 1984); Mullis et al. U.S. Pat. No. 4,683,195; Nucleic AcidHybridization (B. D. Harries & S. J. Higgins eds. 1984); TranscriptionAnd Translation (B. D. Hames & S. J. Higgins eds. 1984); Culture OfAnimal Cells (R. I. Freshney, Alan R. Liss, Inc., 1987); ImmobilizedCells And Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide ToMolecular Cloning (1984); the series, Methods In ENZYMOLOGY (J. Abelsonand M. Simon, eds.-in-chief, Academic Press, Inc., New York),specifically, Vols. 154 and 155 (Wu et al. eds.) and Vol. 185, “GeneExpression Technology” (D. Goeddel, ed.); Gene Transfer Vectors ForMammalian Cells (J. H. Miller and M. P. Calos eds., 1987, Cold SpringHarbor Laboratory); Immunochemical Methods In Cell And Molecular Biology(Mayer and Walker, eds., Academic Press, London, 1987); Handbook OfExperimental Immunology, Volumes I-IV (D. M. Weir and C. C. Blackwell,eds., 1986); and Manipulating the Mouse Embryo, (Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y., 1986).

The present invention allows governing TALE/nucleic acid interactions inseveral directions by using arrays of particular RVDs in the repeatsequences of a TALE. The present invention allows to increase thespecificity of a RVD array to one target compared to all other possibletargets therefore reducing the off-target TALE/DNA interactions by usinghighly specific RVDs compared to natural RVDs.

New RVDs according to the present invention are selected from the groupconsisting of:

-   -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group of A, G, V, L, I, M, S, T,        C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T.

As a non-limiting illustrative example, RVD “IL” can be used as a highlyspecific or recognizing a nucleotide A in a nucleic acid targetsequence. The present invention also allows to increase the flexibilityof a RVD array therefore targeting more than one target or only adesired set of desired targets by locally decreasing the specificity ofa RVD; as a non-limiting illustrative example, RVD “VT” can be used as aflexible RVD which is able to recognize A or G in a nucleic acid targetsequence. The present invention also allows to increase or decrease theactivity of a RVD array on a nucleic acid target sequence; as anon-limiting illustrative example, RVD “SW” can be used as a specificRVD for recognize a nucleotide A in a target sequence as A is the onlynucleotide it recognizes but with less strength than a RVD “IL” whichspecifically and strongly recognizes a nucleotide A (Table 3; SEQ ID:19-25). Several applications may result from the present invention; as anon-limiting example, several allelic polymorphisms (Single NucleotidePolymorphisms or SNPs) differing by one or a few nucleotidessubstitutions at a particular genomic locus can be targeted by the samearray of RVDs according to the present invention, by using more or lessspecific and/or more or less flexible and/or more or less active RVDsaccording to the present invention. A method that could result from thepresent invention allows the treatment of a particular genetic diseaseby constructing and administering one unique TALE derived protein orchimeric protein according to the invention to every subjects in needthereof, whatever SNPs profiles around said mutation responsible forgenetic disease in these subjects. Hence, said method of the presentinvention avoids the need to construct and administer one personalizedTALE derived protein or chimeric protein for each subject in needthereof that takes into account each SNP profile around the mutation tocure. As another non-limiting example, flexible and/or specific and/oractive RVDs can be used to target a particular gene in different specieswhatever minor variations in gene sequence can exist in each targetedspecies.

I. TALE Derived Protein Comprising New RVD(s)

In a general aspect, the present invention relates to proteins thatallow to efficiently target and/or process nucleic acids. In aparticular aspect, the present invention relates to a protein comprisinga repeat domain (also named TALE array) wherein the repeat domaincomprises at least one repeat sequence (or repeat unit) derived from aTranscription Activator-Like Effector (TALE) wherein at least one repeatsequence comprises one or more Repeat Variable Diresidue region (RVD)according to the present invention which is responsible for the bindingof one specific nucleotide in nucleic acid target sequence.

In an embodiment, said repeat domain comprises a plurality of repeatsequences derived from a TALE. In another embodiment, said repeat domaincomprises a plurality of repeat sequences derived from a TALE and atleast another repeat sequence not derived from a TALE. In anotherembodiment, said repeat domain contains a plurality of repeat sequencesderived from a TALE and at least another repeat sequence partiallyderived from a TALE. In another embodiment, said repeat domain containsa plurality of repeat sequences partially derived from a TALE. Inanother embodiment, said repeat sequences partially derived from a TALEcan be obtained using substitution matrix for sequence alignmentproteins. Non-limiting examples of substitution matrix for sequencealignment proteins include, for example, BLOSUM (Yakubovskaya, Mejia etal. 2010) or PAM Matrices (Dayhoff, M. O., Schwartz, R. and Orcutt, B.C. 1978). As non-limiting illustrative examples, repeat sequencesobtained using BLOSUM substitution matrix are given by SEQ ID NO: 6 to8. In another embodiment, said repeat sequences partially derived from aTALE can be obtained using homologous protein structures. Non-limitingexamples of homologous protein structures include, for example, MTERF1(mitochondria transcription terminator1) (Henikoff and Henikoff 1992) ortetratricopeptide repeat (TPR)-like domain (Murakami, M. T. et al.2010). Non-limiting illustrative examples of repeat sequences partiallyderived from MTERF1 structures are given by SEQ ID NO: 15 to 18. Inanother embodiment, said repeat sequences not derived (partiallyderived) from a TALE can be obtained by modifying, as non-limitingexamples, loop and/or helices regions. Non-limiting illustrativeexamples are given by SEQ ID NO: 1-5 and 9-14.

In a preferred embodiment, said repeat domain contains between 8 and 30repeat sequences derived from a TALE, more preferably between 8 and 20,again more preferably 15. More preferably, repeat sequences of a TALEDNA binding domain according to the present invention comprising 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, or 30 repeat sequences.

In another embodiment, said repeat sequences (or repeat units) are madeof 30 to 42 amino acids, more preferably 33 to 35 amino acids, againmore preferably 33 or 34 wherein two critical amino acids located atpositions 12 and 13, i.e. Repeat Variable-Diresidue (RVD), mediates therecognition of one nucleotide in said nucleic acid target sequence. Inanother embodiment, RVDs comprise any known amino acid residues inpositions 12 and 13. In a preferred embodiment, RVDs comprise one aminoacid residue from the group consisting of A, G, V, L, I, M, S, T, C, P,D, E, F, Y, W, Q, N, H, R and K in position 12 according to amino acidone-letter code. In another preferred embodiment, RVDs comprise oneamino acid residue from the group consisting of A, G, V, L, I, M, S, T,C, P, D, E, F, Y, W, Q, N, H, R and K in position 13 according to aminoacid one-letter code. In another embodiment, RVDs comprise a combinationof amino acid residues A, G, V, L, I, M, S, T, C, P, D, E, F, Y, W, Q,N, H, R and K according to amino acid one-letter code in positions 12and 13 for recognizing nucleotides A, C, G and T in a nucleic acidtarget sequence. In a preferred embodiment, one or more RVD of repeatsequences is selected from the group consisting of:

-   -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T

More particularly, the present invention relates to a TranscriptionActivator-Like Effector (TALE) DNA binding domain specific for a nucleicacid target sequence comprising a plurality of TALE repeat sequences(also named repeat units) containing each one a Repeat VariableDiresidue region (RVD) as described above which is responsible for thebinding of one specific nucleotide pair in said nucleic acid targetsequence. In a particular embodiment, further amino acid substitutionsin positions 11 and 14 of one or several repeat sequences of saidTranscription Activator-Like Effector (TALE) DNA binding domain specificfor a nucleic acid target sequence can be present. Repeat sequencesaccording to the invention can comprise a mutation on residue 14. Inanother embodiment, repeat sequences comprise one amino acid residuefrom the group consisting of A, G, V, L, I, M, S, T, C, P, D, E, F, Y,W, Q, N, H, R and K in position 14 according to amino acid one-lettercode for recognizing nucleotides A, C, G and T. In another embodiment,RVDs comprise a combination of amino acid residues A, G, V, L, I, M, S,T, C, P, D, E, F, Y, W, Q, N, H, R and K according to amino acidone-letter code in positions 12, 13 and 14 for recognizing nucleotidesA, C, G and T in a nucleic acid target sequence. In other words, thescope of the present invention encompasses Repeat Variable Triresidueresponsible for the binding of one nucleotide in a nucleic acid targetsequence.

In a further embodiment, repeat sequences comprise a mutation on residue11 of the repeat sequence and can comprise one amino acid residue fromthe group consisting of A, G, V, L, I, M, S, T, C, P, D, E, F, Y, W, Q,N, H, R and K in position 11 according to amino acid one-letter code. Inanother embodiment, RVDs comprise a combination of amino acid residuesA, G, V, L, I, M, 5, T, C, P, D, E, F, Y, W, Q, N, H, R and K accordingto amino acid one-letter code in positions 11, 12, 13 and 14 forrecognizing nucleotides A, C, G and T in a nucleic acid target sequence.In other words, the present invention encompasses Repeat VariableQuadriresidue responsible for the binding of one nucleotide in a nucleicacid target sequence. In another embodiment, repeat sequences comprise acombination of amino acid residues A, G, V, L, I, M, S, T, C, P, D, E,F, Y, W, Q, N, H, R and K according to amino acid one-letter code inpositions 11, 12 and 14, in positions 11, 13 and 14 or in positions 11,12 and 13 for recognizing nucleotides A, C, G and T in a nucleic acidtarget sequence. In another embodiment, repeat sequences comprise acombination of amino acid residues A, G, V, L, I, M, S, T, C, P, D, E,F, Y, W, Q, N, H, R and K according to amino acid one-letter code inpositions 12 and 14, 13 and 14, 11 and 14, 11 and 13 or in positions 11and 12 for recognizing nucleotides A, C, G and T in a nucleic acidtarget sequence.

In another embodiment, the combination of amino acid residues present inpositions 12 and 13 of a RVD “n” influences the combination of aminoacid residues present in positions 12 and 13 of a RVD “n−1” or “n+1” inthe repeat domain of the polypeptides of the present invention. Inanother embodiment, further amino acid substitutions in positions 11 and14 of a RVD “n” can influence the combination of amino acid residuespresent in positions 12 and 13 of a RVD “n−1” or “n+1” in the repeatdomain of the polypeptides of the present invention.

In preferred particular embodiment, repeat domain of the polypeptides ofthe present invention contains specific pairs of RVDs for recognizingspecific pairs of nucleotides A, C, G and T in a nucleic acid targetsequence. In another preferred embodiment, said specific pairs of RVDsfor recognizing specific pairs of nucleotides A, C, G and T in a nucleicacid target sequence are different from the two RVDs able toindividually recognize nucleotides composing said pair of nucleotides;in other words, said pairs of RVDs contain combinations of amino acidresidues in positions 12 and 13 that are different from the combinationsof amino acid residues present in positions 12 and 13 of the individualRVDs. As a non-limiting example, in the polypeptides of the presentinvention a pair of RVDs for recognizing nucleotides sequence “AG” cancomprise amino acid residues in positions 12 and 13 different from pairs“TL-VT” or “VT-VT” that would result from the teaching of individualRVDs recognizing successive nucleotides A and G (Table 3; SEQ ID:19-25). In another embodiment, further amino acid substitutions inpositions 11 and 14 of one or two RVDs of a specific pair of RVDs forrecognizing specific pairs of nucleotides A, C, G and T in a nucleicacid target sequence can be present.

In another particular embodiment, repeat domain of the polypeptides ofthe present invention contains specific triplets of RVDs for recognizingspecific triplets of nucleotides A, C, G and T in a nucleic acid targetsequence. In another preferred embodiment, said specific triplets ofRVDs for recognizing specific triplets of nucleotides A, C, G and T in anucleic acid target sequence are different from the three RVDs able toindividually recognize nucleotides composing said triplet ofnucleotides; in other words, said triplets of RVDs contain combinationsof amino acid residues in positions 12 and 13 that are different fromthe combinations of amino acid residues present in positions 12 and 13of the individual RVDs. As a non-limiting example, in the polypeptidesof the present invention a triplet of RVDs for recognizing nucleotidessequence “AGG” can comprise amino acid residues in positions 12 and 13different from triplets “IL-VT-VT” or “VT-VT-VT” that would result fromthe teaching of individual RVDs recognizing successive nucleotides A andG (Table 3; SEQ ID: 19-25). In another embodiment, further amino acidsubstitutions in positions 11 and 14 of one or two or three RVDs of aspecific triplet of RVDs for recognizing specific triplets ofnucleotides A, C, G and T in a nucleic acid target sequence can bepresent.

II. Chimeric TALE Derived Protein Comprising New RVD(s)

In another embodiment the present invention relates to a chimericprotein derived from a TALE corresponding to a fusion between a TALE DNAbinding domain as mentioned above and an additional protein domain toprocess the nucleic acid within or adjacent to the specific nucleic acidtarget sequence. In other words, said polypeptide of the presentinvention is a chimeric protein derived from a TALE comprising:

-   (a) A Transcription Activator-Like Effector (TALE) DNA binding    domain specific for a nucleic acid target sequence comprising a    plurality of TALE repeat sequences comprising each one a Repeat    Variable Diresidue region (RVD) which is responsible for the binding    of one specific nucleotide in said nucleic acid target sequence;    wherein one or more RVD is selected from the group consisting of:    -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, 5, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK QK TK, DN, EN FN, GN, KN, PN, RN, TN, YN,        WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T-   (b) An additional domain to process the nucleic acid within or    adjacent to the specific nucleic acid target sequence.

In another embodiment, said chimeric protein according to the presentinvention can comprise at least one peptidic linker to fuse said TALEDNA binding domain and said additional protein domain processing thenucleic acid. In a preferred embodiment, said peptidic linker isflexible. In another preferred embodiment, said peptidic linker isstructured.

In a particular embodiment, the additional protein domain of thechimeric protein of the present invention can be a transcriptionactivator or repressor (i.e. a transcription regulator), or a proteinthat interacts with or modifies other proteins implicated in DNAprocessing. Non-limiting examples of DNA processing activities of saidchimeric protein of the present invention include, for example, creatingor modifying epigenetic regulatory elements, making site-specificinsertions, deletions, or repairs in DNA, controlling gene expression,and modifying chromatin structure.

In another particular embodiment, said additional protein domain hascatalytic activity selected from the group consisting of nucleaseactivity, polymerase activity, kinase activity, phosphatase activity,methylase activity, topoisomerase activity, integrase activity,transposase activity, ligase activity, helicase activity, recombinaseactivity. In a preferred embodiment, said additional protein domain is anuclease, preferably an endonuclease; in another preferred embodiment,said protein domain is an exonuclease.

When comprising an endonuclease, said chimeric protein of the presentinvention derived from a TALE is a TALE-nuclease; in other words, in thescope of the present invention is a TALE-nuclease comprising:

-   (a) A Transcription Activator-Like Effector (TALE) DNA binding    domain specific for a nucleic acid target sequence comprising a    plurality of TALE repeat sequences comprising each one a Repeat    Variable Diresidue region (RVD) which is responsible for the binding    of one specific nucleotide in said nucleic acid target sequence,    wherein one or more RVDs is selected from the group consisting of:    -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN, FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T;-   (b) An endonuclease domain to cleave the nucleic acid within or    adjacent to the specific nucleic acid target sequence.

In another embodiment, further amino acid substitutions in positions 11and 14 of one or several RVDs of said chimeric protein or TALE-nucleaseaccording to the present invention can be present.

In a preferred embodiment, said TALE-nuclease according to the presentinvention can comprise at least one peptidic linker to fuse said TALEDNA binding domain and said endonuclease domain. In a preferredembodiment, said peptidic linker is flexible. In another preferredembodiment, said peptidic linker is structured.

Depending on the endonuclease domain that constitutes said TALE-nucleaseaccording to the present invention, cleavage in the nucleic acid withinor adjacent to the specific nucleic acid target sequence corresponds toeither a double-stranded break or a single-stranded break.

As non limiting example, said endonuclease can be a type IIS FokIendonuclease domain or functional variant thereof which functionsindependently of the DNA binding domain and induces nucleic aciddouble-stranded cleavage as a dimer (Li, Wu et al. 1992; Kim, Cha et al.1996). Amino acid sequence of FokI variants can be prepared by mutationsin the DNA, which encodes the catalytic domain. Such variants include,for example, deletions from, or insertions or substitutions of, residueswithin the amino acid sequence. Any combination of deletion, insertion,and substitution may also be made to arrive at the final construct,provided that the final construct possesses the desired activity. Saidnuclease domain of FokI variant according to the present inventioncomprises a fragment of a protein sequence having at least 80%, morepreferably 90%, again more preferably 95% amino acid sequence identitywith the protein sequence of FokI. In particular embodiment, a first anda second chimeric proteins can function respectively as monomer to acttogether as a dimer to process the nucleic acid within or adjacent to aspecific nucleic acid target. As a non-limiting example, the twomonomers can recognize different adjacent nucleic acid target sequencesand the two protein domains constituting each chimeric protein derivedfrom a TALE, function as subdomains that need to interact in order toprocess the nucleic acid within or adjacent to said specific nucleicacid target sequence.

In another particular embodiment, said chimeric protein is a monomericTALE-nuclease that does not require dimerization for specificrecognition and cleavage. As non limiting example, such monomericTALE-nuclease comprises a TALE DNA binding domain fused to the catalyticdomain of 1-Tevl or a variant thereof.

It is understood that RVDs, DNA binding domains, TALE-nucleases,chimeric protein and polypeptides according to the present invention canalso comprise single or plural additional amino acid substitutions oramino acid insertion or amino acid deletion introduced by mutagenesisprocess well known in the art. Is also encompassed in the scope of thepresent invention variants, functional mutants and derivatives fromRVDs, DNA binding domains, TALE-nucleases, chimeric protein andpolypeptides according to the present invention. Are also encompassed inthe scope of the present invention RVDs, DNA binding domains,TALE-nucleases, chimeric proteins and polypeptides which present asequence with high percentage of identity or high percentage of homologywith sequences of RVDs, DNA binding domains, TALE-nucleases, chimericproteins and polypeptides according to the present invention, atnucleotidic or polypeptidic levels. By high percentage of identity orhigh percentage of homology it is intended 70%, more preferably 75%,more preferably 80%, more preferably 85%, more preferably 90%, morepreferably 95, more preferably 97%, more preferably 99% or any integercomprised between 70% and 99%.

In another aspect of the present invention are polynucleotides encodingfor or comprising a coding sequence for the polypeptides, TALE DNAbinding domain, chimeric protein derived from a TALE and TALE-nucleaseaccording to the present invention. Is also encompassed a vectorcomprising such polynucleotides.

Is also encompassed in the scope of the present invention a host cellwhich comprises a vector and/or a recombinant polynucleotide encodingfor or comprising a coding sequence for the polypeptides, TALE DNAbinding domain, chimeric protein derived from a TALE and TALE-nucleaseaccording to the present invention.

Is also encompassed in the scope of the present invention a non-humantransgenic animal comprising a vector and/or a recombinantpolynucleotide encoding for or comprising a coding sequence for thepolypeptides, TALE DNA binding domain, chimeric protein derived from aTALE and TALE-nuclease according to the present invention.

Is also encompassed in the scope of the present invention a transgenicplant comprising a vector and/or a recombinant polynucleotide encodingfor or comprising a coding sequence for the polypeptides, TALE DNAbinding domain, chimeric protein derived from a TALE and TALE-nucleaseaccording to the present invention.

The present invention also relates to a kit comprising a polypeptide ora TALE DNA binding domain or a chimeric protein derived from a TALE or aTALE-nuclease according to the present invention or a vector and/or arecombinant polynucleotide encoding for or comprising a coding sequencefor such recombinant molecules and instructions for use said kit.

The present invention also relates to a composition comprising apolypeptide or a TALE DNA binding domain or a chimeric protein derivedfrom a TALE or a TALE-nuclease according to the present invention or avector and/or a recombinant polynucleotide encoding for or comprising acoding sequence for such recombinant molecules and a carrier. Morepreferably, is a pharmaceutical composition comprising such recombinantmolecules and a pharmaceutically active carrier. For purposes oftherapy, the chimeric protein according to the present invention and apharmaceutically acceptable excipient are administered in atherapeutically effective amount. Such a combination is said to beadministered in a “therapeutically effective amount” if the amountadministered is physiologically significant. An agent is physiologicallysignificant if its presence results in a detectable change in thephysiology of the recipient. In the present context, an agent isphysiologically significant if its presence results in a decrease in theseverity of one or more symptoms of the targeted disease and in a genomecorrection of the lesion or abnormality.

III. Methods

In another aspect, the present invention also relates to methods for useof protein comprising TALE domain according to the present invention forvarious applications ranging from targeted nucleic acid cleavage totargeted gene regulation.

More particularly, the present invention relates to a method for bindinga nucleic acid target sequence comprising:

-   (a) Selecting a nucleic acid target sequence;-   (b) Engineering a protein comprising at least one Transcription    Activator-Like Effector (TALE) domain wherein said TALE domain    comprises a plurality of TALE repeat sequences comprising each one a    Repeat Variable Diresidue region (RVD) which is responsible for the    binding of one specific nucleotide in the nucleic acid target    sequence, wherein one or more RVD is selected from the group    consisting of:    -   II, TI, YI, PI, SI, CL, DL, FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN, FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T-   (c) Contacting said engineered protein with said nucleic acid target    sequence such that the engineered protein binds to said nucleic acid    target sequence.

In particular embodiment, the present invention relates to a method forprocessing a genetic material in a cell comprising:

-   (a) Providing a cell comprising a nucleic acid target sequence;-   (b) Engineering a protein comprising at least one Transcription    Activator-Like Effector (TALE) domain wherein said TALE domain    comprises a plurality of TALE repeat sequences comprising each one a    Repeat Variable Diresidue region (RVD) which is responsible for the    binding of one specific nucleotide in the nucleic acid target    sequence, wherein one or more RVD is selected from the group    consisting of:    -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, OR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN, FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T-   (c) Introducing said protein into a cell.

The term “processing” as used herein means that the sequence isconsidered modified simply by the binding of the protein. Any nucleicacid target sequence can be processed by the present methods. Forexample, the nucleic acid target sequence can be chromosomal,mitochondrial or chloroplast sequences.

In a more particular embodiment, said engineered protein of step (b) isa chimeric protein as described above further comprising an additionalprotein domain fused to the TALE domain. In a particular embodiment, theadditional protein domain of the chimeric protein of the presentinvention can be a transcription activator or repressor (i.e. atranscription regulator), or a protein that interacts with or modifiesother proteins implicated in DNA processing. Non-limiting examples ofDNA processing activities of said chimeric protein of the presentinvention include, for example, creating or modifying epigeneticregulatory elements, making site-specific insertions, deletions, orrepairs in DNA, controlling gene expression, and modifying chromatinstructure.

In another embodiment, said additional protein domain has catalyticactivity selected from the group consisting of nuclease activity,polymerase activity, kinase activity, phosphatase activity, methylaseactivity, topoisomerase activity, integrase activity, transposaseactivity, ligase activity, helicase activity, recombinase activity. In apreferred embodiment, said protein domain is a nuclease, preferably anendonuclease; in another preferred embodiment, said protein domain is anexonuclease.

The present invention more particularly relates to a method formodifying the genetic material of a cell within or adjacent to a nucleicacid target sequence. The double strand breaks caused by endonucleasesare commonly repaired through non-homologous end joining (NHEJ). NHEJcomprises at least two different processes. Mechanisms involve rejoiningof what remains of the two DNA ends through direct re-ligation(Critchlow and Jackson 1998) or via the so-called microhomology-mediatedend joining (Ma, Kim et al. 2003). Repair via non-homologous end joining(NHEJ) often results in small insertions or deletions and can be usedfor the creation of specific gene knockouts. The present inventionrelates to a method for modifying the genetic material in a cell withinor adjacent to a nucleic acid target sequence by using chimeric protein,preferably a TALE-nuclease according to the present invention thatallows nucleic acid cleavage that will lead to the loss of geneticinformation and any NHEJ pathway will produce targeted mutagenesis. In apreferred embodiment, the present invention related to a method formodifying the genetic material of a cell within or adjacent to a nucleicacid target sequence by generating at least one nucleic acid cleavageand a loss of genetic information around said nucleic acid targetsequence thus preventing any scarless re-ligation by NHEJ. Saidmodification may be a deletion of the genetic material, insertion ofnucleotides in the genetic material or a combination of both deletionand insertion of nucleotides.

The present invention also relates to a method for modifying nucleicacid target sequence further comprising the step of expressing anadditional catalytic domain into a host cell. In a more preferredembodiment, the present invention relates to a method to increasemutagenesis wherein said additional catalytic domain is a DNAend-processing enzyme. Non limiting examples of DNA end-processingenzymes include 5-3′ exonucleases, 3-5′ exonucleases, 5-3′ alkalineexonucleases, 5′ flap endonucleases, helicases, hosphatase, hydrolasesand template-independent DNA polymerases. Non limiting examples of suchcatalytic domain comprise of a protein domain or catalytically activederivate of the protein domain selected from the group consisting ofhExoI (EXO1_HUMAN), Yeast ExoI (EXO1_YEAST), E. coli ExoI, Human TREX2,Mouse TREX1, Human TREX1, Bovine TREX1, Rat TREX1, TdT (terminaldeoxynucleotidyl transferase) Human DNA2, Yeast DNA2 (DNA2_YEAST). In apreferred embodiment, said additional catalytic domain has a3′-5′-exonuclease activity, and in a more preferred embodiment, saidadditional catalytic domain has TREX exonuclease activity, morepreferably TREX2 activity. In another preferred embodiment, saidcatalytic domain is encoded by a single chain TREX polypeptide. Saidadditional catalytic domain may be fused to the chimeric proteinaccording to the invention optionally by a peptide linker. It has beenfound that the coupling of the enzyme TREX2 with an endonuclease such asa TALE-nuclease ensures high frequency of targeted mutagenesis(WO2012/058458)

In a preferred embodiment, the present invention relates to a method formodifying the genetic material of a cell comprising:

-   (a) Providing a cell comprising a nucleic acid target sequence;-   (b) Introducing a protein comprising at least:    -   (i) A Transcription Activator-Like Effector (TALE) DNA binding        domain specific for a nucleic acid target sequence comprising a        plurality of TALE repeat sequences comprising each one a Repeat        Variable Diresidue region (RVD) which is responsible for the        binding of one specific nucleotide in said nucleic acid target        sequence and wherein said TALE DNA binding domain comprises one        or more RVDs selected from the group consisting of:    -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN, FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T-   (ii) An endonuclease,-   (c) Inducing the expression of protein of (b);-   (d) Selecting the cells in which cleavage within or adjacent to the    specific nucleic acid target sequence has occurred.

In another embodiment, cells in which said protein has been introducedis selected by a selection method well-known in the art. As non-limitingexample, said protein or chimeric protein can be introduced as atransgene encoded by a plasmidic vector; said plasmidic vector containsa selection marker which allows to identify and/or select cells whichreceived said vector. Said protein expression can be induced in selectedcells and said TALE domain of the protein bind nucleic acid targetsequence in selected cells, thereby obtaining cells in which TALE domainbinds a specific nucleic acid target sequence. The methods of theinvention involve introducing a polynucleotide encoding engineeredprotein or chimeric protein into a cell. Vectors comprising targetingnucleic acid and/or nucleic acid encoding engineered protein or chimericprotein according to the present invention can be introduced into a cellby a variety of methods (e.g., injection, direct uptake, projectilebombardment, liposomes, electroporation). Engineered protein or chimericproteins according to the present invention can be stably or transientlyexpressed into cells using expression vectors. Techniques of expressionin eukaryotic cells are well known to those in the art. (See CurrentProtocols in Human Genetics: Chapter 12 “Vectors For Gene Therapy” &Chapter 13 “Delivery Systems for Gene Therapy”). The protein may besynthesized in situ in the cell as a result of the introduction ofpolynucleotide encoding protein into the cell. Alternatively, theprotein could be produced outside the cell and then introduced theretoby well known method of the art.

Cells in which a cleavage-induced mutagenesis event, i.e. a mutagenesisevent consecutive to an NHEJ event, has occurred can be identifiedand/or selected by well-known method in the art. As a non-limitingexample, deep-sequencing analysis can be generated from the targetedcell genome around the targeted locus. Insertion/deletion events(mutagenesis events) can be therefore detected. As another non-limitingexample, assays based on T7 endonuclease that recognizes non-perfectlymatched DNA can be used, to quantify from a locus specific PCR ongenomic DNA from provided cells, mismatches between reannealed DNAstrands coming from cleaved/non-cleaved DNA molecules.

Endonucleolytic breaks are known to stimulate the rate of homologousrecombination. Therefore, in another embodiment, the present inventionrelates to a method for inducing homologous gene targeting in thenucleic acid target sequence further comprising introducing into thecell an exogeneous nucleic acid comprising at least a sequencehomologous to a portion of the nucleic acid target sequence, such thathomologous recombination occurs between the target nucleic acid sequenceand the exogeneous nucleic acid. In other words, following cleavage ofthe nucleic acid target sequence, a homologous recombination event isstimulated between the nucleic acid target sequence and the exogenousnucleic acid. By nucleic acid homologous sequence it is meant a nucleicacid sequence with enough identity to another one to lead to homologousrecombination between sequences, more particularly having at least 80%identity, preferably at least 90% identity and more preferably at least95%, and even more preferably 98% identity.

In another embodiment, said exogenous nucleic acid comprises twosequences homologous to portions or adjacent portions of said nucleicacid target sequence flanking a sequence to introduce in the nucleicacid target sequence. Preferably, said exogenous nucleic acid comprisesfirst and second portions which are homologous to region 5′ and 3′ ofthe nucleic acid target, respectively. In another embodiment, saidexogenous sequence allows introducing new genetic material into a cell.Said exogenous nucleic acid in this embodiment also comprises a thirdportion positioned between the first and the second portion whichcomprises no homology with the regions 5′ and 3′ of the nucleic acidtarget sequence. Said new genetic material introduced into a cell canconfer a selective or a commercial advantage to said cell. In anotherembodiment, said exogenous sequence allows to replace genetic materialinto a cell. In another embodiment, said exogenous sequence allows torepair genetic material into a cell.

Preferably, homologous sequences of at least 50 bp, preferably more than100 bp and more preferably more than 200 bp are used within said donormatrix. Therefore, the exogenous nucleic acid is preferably from 200 bpto 6000 bp, more preferably from 1000 bp to 2000 bp. Indeed, sharednucleic acid homologies are located in regions flanking upstream anddownstream the site of the cleavage and the nucleic acid sequence to beintroduced should be located between the two arms.

In particular embodiments, said exogenous nucleic acid can comprise apositive selection marker between the two homology arms and eventually anegative selection marker upstream of the first homology arm ordownstream of the second homology arm. The marker(s) allow(s) theselection of the cells having inserted the sequence of interest byhomologous recombination at the target site. Depending on the locationof the targeted genome sequence wherein break event has occurred, suchexogenous nucleic acid can be used to knock-out a gene, e.g. whenexogenous nucleic acid is located within the open reading frame of saidgene, or to introduce new sequences or genes of interest. Sequenceinsertions by using such exogenous nucleic acid can be used to modify atargeted existing gene, by correction or replacement of said gene(allele swap as a non-limiting example), or to up- or down-regulate theexpression of the targeted gene (promoter swap as non-limiting example),said targeted gene correction or replacement. In a particularembodiment, the exogenous nucleic acid is included in a vector encodingthe TALE-derived protein or chimeric protein or alternatively, in adifferent vector. In another particular embodiment, the exogenousnucleic acid is a single- or double stranded oligonucleotide.

Cells in which a homologous recombination event has occurred can beselected by methods well-known in the art. As a non-limiting example,PCR analysis using one oligonucleotide matching within the exogenousnucleic acid sequence and one oligonucleotide matching the genomicnucleic acid of cells outside said exogenous nucleic acid but close tothe targeted locus can be performed. Therefore, cells in which methodsof the invention allowed a mutagenesis event or a homologousrecombination event to occur can be selected.

In another embodiment, said exogenous sequence to be introduced into acell can be optimized in order to be not cleavable by the protein usedto generate the initial double-stranded break. In other words, in thecase where a nucleic acid target sequence has to be corrected byreplacement consecutively to a double-stranded break generated by aprotein or a chimeric protein according to the present invention,exogenous replacement sequence can be modified in order to be notcleavable again by the original protein or chimeric protein. Saidmodifications include as non-limiting example silent mutations whentargeted sequence is in a coding sequence of a gene or mutations whentargeted sequence is in a non-coding sequence of a gene.

Another aspect of the invention is a method for producing oneTranscription Activator-Like Effector (TALE) domain comprising:

-   (a) Determining a nucleic acid target sequence;-   (b) Synthesizing a repeat sequence domain specific for a nucleic    acid target sequence comprising a plurality of TALE repeat sequences    comprising each one a Repeat Variable Diresidue region (RVD) which    is responsible for the binding of one specific nucleotide in said    nucleic acid target sequence, wherein one or more RVD is selected    from the group consisting of:    -   II, TI, YI, PI, SI, CL, DL FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C;    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G;    -   PG, AP, LP, MP, VP for recognizing T;    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C;    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN, FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G;    -   MG, PL, VP for recognizing A or T.

In a particular embodiment, the present invention relates to a methodfor producing a chimeric protein further comprising:

-   (c) Providing an additional protein domain to process the nucleic    acid within or adjacent to the specific nucleic acid target    sequence;-   (d) Optionally designing a peptidic linker to link TALE domain with    said additional protein domain;-   (e) Assembling said chimeric protein.

The scope of the present invention also encompasses a chimeric proteinobtainable by a method comprising at least the steps of:

-   (a) Determining a nucleic acid target sequence;-   (b) Synthesizing a repeat sequence domain specific for a nucleic    acid target sequence comprising a plurality of TALE repeat sequences    comprising each one a Repeat Variable Diresidue region (RVD) which    is responsible for the binding of one specific nucleotide in said    nucleic acid target sequence, wherein one or more RVD is selected    from the group consisting of:    -   II, TI, VI, PI, SI, CL, DL, FL, GL, HL, IL, KL, LL, YL, MM, WY,        PV, SW, XF for recognizing A, wherein X represents one amino        acid residue selected from the group consisting of A, G, V, L,        I, M, S, T, C, P, D, E, F, Y, W, Q, N, H, R and K;    -   RE, QD for recognizing C    -   NK, RK, ER, FR, GR, LR, QR, RR, VR, WK, YK for recognizing G    -   PG, AP, LP, MP, VP for recognizing T    -   CD, DD, FD, LD, TD, AE, EE, KE, QE, YE, CM, IM, NM, PM, QM, SM,        YM, VM, FY, GY, KY, MY, NY, RY, SY, YY, HY for recognizing A or        C    -   RG, PH, VH, CK, FK, PK, QK, TK, DN, EN FN, GN, KN, PN, RN, TN,        YN, WN, FQ, GQ, HQ, IQ, QQ, TQ, FT, LT, VT, PR, DS, SS, FV for        recognizing A or G    -   MG, PL, VP for recognizing A or T-   (c) Providing an additional protein domain to process the nucleic    acid within or adjacent to the specific nucleic acid target    sequence;-   (d) Optionally designing a peptidic linker to link polypeptides    obtained in b) and c);-   (e) Assembling said chimeric protein;-   (f) Testing the activity of said chimeric protein.

In a further embodiment, synthesis step b) can be done using a solidsupport method composed of consecutive restriction/ligation/washingsteps as shown in FIG. 1 and examples section; step c) can be done bycloning said protein domain of interest into a plasmidic vector; in thecase where said chimeric protein according to the invention is aTALE-nuclease, as non-limiting example, said protein domain can becloned together in a same vector with chosen peptidic linker andeventual additional N and C terminal backbones for a RVD. Assemblingstep e) can be done by cloning repeat sequence domain of step b) in thevector resulting from step e). Testing step f) can be done, in the casewhere said chimeric protein is a TALE-Nuclease as a non-limitingexample, in yeast by using a yeast target reporter plasmid containingthe nucleic acid target sequence as previously described (InternationalPCT Applications WO 2004/067736 and in (Epinat, Arnould et al. 2003;Chames, Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizotet al. 2006). The activity of said TALE-nuclease can be tested at 30° C.and 37° C. in a yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006)

In another embodiment, the cell targeted or modified by the methods ofthe present invention is a eukaryotic cell preferably a mammalian cell,a plant cell or an algal cell.

In another embodiment, the nucleic acid sequence targeted or modified bythe methods of the present invention is a chromosomal sequence or anepisomal sequence. In another embodiment, said sequence is an organellesequence.

The present invention also related to a method for generating a plantcomprising providing a plant cell comprising a nucleic acid targetsequence into which it is desired to introduce a genetic modification;generating a cleavage within or adjacent to the nucleic acid targetsequence by introducing a chimeric protein such as a TALE-nucleaseaccording to the present invention; and generating a plant from the cellor progeny thereof, in which cleavage has occurred. Progeny includesdescendants of a particular plant or plant line. In a particularembodiment, the method for generating a plant further comprisesintroducing an exogenous nucleic acid as desired. Said exogenous nucleicacid comprises a sequence homologous to at least a portion of thenucleic acid target sequence, such that homologous recombination occursbetween said exogenous nucleic acid and the nucleic acid target sequencein the cell or progeny thereof. Plant cells produced using methods canbe grown to generate plants having in their genome a modified nucleicacid target sequence. Seeds from such plants can be used to generateplants having a phenotype such as, for example, an altered growthcharacteristic, altered appearance, or altered compositions with respectto unmodified plants.

The polypeptides of the invention are useful to engineer genomes and toreprogram cells, especially induced Pluripotent Stem cells (iPS) andembryonic stem (ES) cells, preferably non human ES cells.

Other Definitions

-   -   Amino acid residues in a polypeptide sequence are designated        herein according to the one-letter code, in which, for example,        Q means Gln or Glutamine residue, R means Arg or Arginine        residue and D means Asp or Aspartic acid residue.    -   Amino acid substitution means the replacement of one amino acid        residue with another, for instance the replacement of an        Arginine residue with a Glutamine residue in a peptide sequence        is an amino acid substitution.    -   DNA or nucleic acid processing activity refers to a        particular/given enzymatic activity of a protein domain        comprised in a chimeric protein or a polypeptide according to        the invention such as in the expression “an additional protein        domain to process the nucleic acid within or adjacent to the        specific nucleic acid target sequence”. Said DNA or nucleic acid        processing activity can refer to a cleavage activity, either a        cleavase activity either a nickase activity, more broadly a        nuclease activity but also a polymerase activity, a kinase        activity, a phosphatase activity, a methylase activity, a        topoisomerase activity, an integrase activity, a transposase        activity, a ligase, a helicase or recombinase activity as        non-limiting examples.    -   Nucleotides are designated as follows: one-letter code is used        for designating the base of a nucleoside: a is adenine, t is        thymine, c is cytosine, and g is guanine. For the degenerated        nucleotides, r represents g or a (purine nucleotides), k        represents g or t, s represents g or c, w represents a or t, m        represents a or c, y represents t or c (pyrimidine nucleotides),        d represents g, a or t, v represents g, a or c, b represents g,        t or c, h represents a, t or c, and n represents g, a, t or c.    -   by “peptide linker” or “peptidic linker” it is intended to mean        a peptide sequence which allows the connection of different        monomers or different parts comprised in a fusion protein such        as between a TALE DNA binding domain and a protein domain in a        chimeric protein or a polypeptide according to the present        invention and which allows the adoption of a correct        conformation for said chimeric protein activity and/or        specificity. Peptide linkers can be of various sizes, from 3        amino acids to 50 amino acids as a non limiting indicative        range. Peptide linkers can also be qualified as structured or        unstructured. Peptide linkers can be qualified as active linkers        when they comprise active domains that are able to change their        structural conformation under appropriate stimulation.    -   by “subdomain” or “domain” it is intended a protein subdomain or        a protein part that interacts with another protein subdomain or        protein part to form an active entity and/or a catalytic active        entity bearing nucleic acid or DNA processing activity of said        chimeric protein or polypeptide according to the invention.    -   by “DNA target”, “DNA target sequence”, “target DNA sequence”,        “nucleic acid target sequence”, “target nucleic acid sequence”,        “target sequence”, or “processing site” is intended a        polynucleotide sequence that can be processed by a TALE derived        protein or chimeric protein according to the present invention.        These terms refer to a specific nucleic acid location,        preferably a genomic location in a cell, but also a portion of        genetic material that can exist independently to the main body        of genetic material such as plasmids, episomes, virus,        transposons or in organelles such as mitochondria or        chloroplasts as non-limiting examples. The nucleic acid target        sequence is defined by the 5′ to 3′ sequence of one strand of        said target, as indicated for SEQ ID NO: 62-77 in table 2 and        SES ID NO: 94-109 in table 5 as a non-limiting example.    -   Adjacent is used to distinguish between 1) the nucleic acid        sequence recognized and bound by a set of specific RVDs        comprised in the TALE DNA binding domain of a polypeptide or a        chimeric protein according to the present invention and 2) the        nucleic acid target sequence to be processed by said polypeptide        or chimeric protein according to the invention, said nucleic        sequences 1) and 2) being adjacent.    -   By “delivery vector” or “delivery vectors” is intended any        delivery vector which can be used in the present invention to        put into cell contact (i.e. “contacting”) or deliver inside        cells or subcellular compartments agents/chemicals and molecules        (proteins or nucleic acids) needed in the present invention. It        includes, but is not limited to liposomal delivery vectors,        viral delivery vectors, drug delivery vectors, chemical        carriers, polymeric carriers, lipoplexes, polyplexes,        dendrimers, microbubbles (ultrasound contrast agents),        nanoparticles, emulsions or other appropriate transfer vectors.        These delivery vectors allow delivery of molecules, chemicals,        macromolecules (genes, proteins), or other vectors such as        plasmids, peptides developed by Diatos. In these cases, delivery        vectors are molecule carriers. By “delivery vector” or “delivery        vectors” is also intended delivery methods to perform        transfection.    -   The terms “vector” or “vectors” refer to a nucleic acid molecule        capable of transporting another nucleic acid to which it has        been linked. A “vector” in the present invention includes, but        is not limited to, a viral vector, a plasmid, a RNA vector or a        linear or circular DNA or RNA molecule which may consist of a        chromosomal, non chromosomal, semi-synthetic or synthetic        nucleic acids. Preferred vectors are those capable of autonomous        replication (episomal vector) and/or expression of nucleic acids        to which they are linked (expression vectors). Large numbers of        suitable vectors are known to those of skill in the art and        commercially available.

Viral vectors include retrovirus, adenovirus, parvovirus (e.g.adenoassociated viruses), coronavirus, negative strand RNA viruses suchas orthomyxovirus (e.g., influenza virus), rhabdovirus (e.g., rabies andvesicular stomatitis virus), paramyxovirus (e.g. measles and Sendai),positive strand RNA viruses such as picornavirus and alphavirus, anddouble-stranded DNA viruses including adenovirus, herpesvirus (e.g.,Herpes Simplex virus types 1 and 2, Epstein-Barr virus,cytomegalovirus), and poxvirus (e.g., vaccinia, fowlpox and canarypox).Other viruses include Norwalk virus, togavirus, flavivirus, reoviruses,papovavirus, hepadnavirus, and hepatitis virus, for example. Examples ofretroviruses include: avian leukosis-sarcoma, mammalian C-type, B-typeviruses, D type viruses, HTLV-BLV group, lentivirus, spumavirus (Coffin,J. M., Retroviridae: The viruses and their replication, In FundamentalVirology, Third Edition, B. N. Fields, et al., Eds., Lippincott-RavenPublishers, Philadelphia, 1996).

By “lentiviral vector” is meant HIV-Based lentiviral vectors that arevery promising for gene delivery because of their relatively largepackaging capacity, reduced immunogenicity and their ability to stablytransduce with high efficiency a large range of different cell types.Lentiviral vectors are usually generated following transienttransfection of three (packaging, envelope and transfer) or moreplasmids into producer cells. Like HIV, lentiviral vectors enter thetarget cell through the interaction of viral surface glycoproteins withreceptors on the cell surface. On entry, the viral RNA undergoes reversetranscription, which is mediated by the viral reverse transcriptasecomplex. The product of reverse transcription is a double-strandedlinear viral DNA, which is the substrate for viral integration in theDNA of infected cells.

By “integrative lentiviral vectors (or LV)”, is meant such vectors asnon limiting example, that are able to integrate the genome of a targetcell.

At the opposite by “non integrative lentiviral vectors (or NILV)” ismeant efficient gene delivery vectors that do not integrate the genomeof a target cell through the action of the virus integrase.

One type of preferred vector is an episome, i.e., a nucleic acid capableof extra-chromosomal replication. Preferred vectors are those capable ofautonomous replication and/or expression of nucleic acids to which theyare linked. Vectors capable of directing the expression of genes towhich they are operatively linked are referred to herein as “expressionvectors. A vector according to the present invention comprises, but isnot limited to, a YAC (yeast artificial chromosome), a BAC (bacterialartificial), a baculovirus vector, a phage, a phagemid, a cosmid, aviral vector, a plasmid, a RNA vector or a linear or circular DNA or RNAmolecule which may consist of chromosomal, non chromosomal,semi-synthetic or synthetic DNA. In general, expression vectors ofutility in recombinant DNA techniques are often in the form of“plasmids” which refer generally to circular double stranded DNA loopswhich, in their vector form are not bound to the chromosome. Largenumbers of suitable vectors are known to those of skill in the art.Vectors can comprise selectable markers, for example: neomycinphosphotransferase, histidinol dehydrogenase, dihydrofolate reductase,hygromycin phosphotransferase, herpes simplex virus thymidine kinase,adenosine deaminase, glutamine synthetase, and hypoxanthine-guaninephosphoribosyl transferase for eukaryotic cell culture; TRP1 for S.cerevisiae; tetracyclin, rifampicin or ampicillin resistance in E. coli.Preferably said vectors are expression vectors, wherein a sequenceencoding a polypeptide of interest is placed under control ofappropriate transcriptional and translational control elements to permitproduction or synthesis of said polypeptide. Therefore, saidpolynucleotide is comprised in an expression cassette. Moreparticularly, the vector comprises a replication origin, a promoteroperatively linked to said encoding polynucleotide, a ribosome bindingsite, a RNA-splicing site (when genomic DNA is used), a polyadenylationsite and a transcription termination site. It also can comprise anenhancer or silencer elements. Selection of the promoter will dependupon the cell in which the polypeptide is expressed. Suitable promotersinclude tissue specific and/or inducible promoters. Examples ofinducible promoters are: eukaryotic metallothionine promoter which isinduced by increased levels of heavy metals, prokaryotic lacZ promoterwhich is induced in response to isopropyl-β-D-thiogalacto-pyranoside(IPTG) and eukaryotic heat shock promoter which is induced by increasedtemperature. Examples of tissue specific promoters are skeletal musclecreatine kinase, prostate-specific antigen (PSA), α-antitrypsinprotease, human surfactant (SP) A and B proteins, β-casein and acidicwhey protein genes.

Inducible promoters may be induced by pathogens or stress, morepreferably by stress like cold, heat, UV light, or high ionicconcentrations (reviewed in Potenza C et al. 2004, In vitro Cell DevBiol 40:1-22). Inducible promoter may be induced by chemicals (reviewedin (Moore, Samalova et al. 2006); (Padidam 2003); (Wang, Zhou et al.2003); (Zuo and Chua 2000).

Delivery vectors and vectors can be associated or combined with anycellular permeabilization techniques such as sonoporation orelectroporation or derivatives of these techniques.

By cell or cells is intended any prokaryotic or eukaryotic living cells,cell lines derived from these organisms for in vitro cultures, primarycells from animal or plant origin.

By “primary cell” or “primary cells” are intended cells taken directlyfrom living tissue (i.e. biopsy material) and established for growth invitro, that have undergone very few population doublings and aretherefore more representative of the main functional components andcharacteristics of tissues from which they are derived from, incomparison to continuous tumorigenic or artificially immortalized celllines. These cells thus represent a more valuable model to the in vivostate they refer to.

In the frame of the present invention, “eukaryotic cells” refer to afungal, plant or animal cell or a cell line derived from the organismslisted below and established for in vitro culture. More preferably, thefungus is of the genus Aspergillus, Penicillium, Acremonium,Trichoderma, Chrysoporium, Mortierella, Kluyveromyces or Pichia; Morepreferably, the fungus is of the species Aspergillus niger, Aspergillusnidulans, Aspergillus oryzae, Aspergillus terreus, Penicilliumchrysogenum, Penicillium citrinum, Acremonium Chrysogenum, Trichodermareesei, Mortierella alpine, Chrysosporium lucknowense, Kluyveromyceslactis, Pichia pastoris or Pichia ciferrii.

More preferably the plant is of the genus Arabidospis, Nicotiana,Solanum, Iactuca, Brassica, Oryza, Asparagus, Pisum, Medicago, Zea,Hordeum, Secale, Triticum, Capsicum, Cucumis, Cucurbita, Citrullis,Citrus, Sorghum; More preferably, the plant is of the speciesArabidospis thaliana, Nicotiana tabaccum, Solanum lycopersicum, Solanumtuberosum, Solanum melongena, Solanum esculentum, Lactuca saliva,Brassica napus, Brassica oleracea, Brassica rapa, Oryza glaberrima,Oryza sativa, Asparagus officinalis, Pisum sativum, Medicago sativa, Zeamays, Hordeum vulgare, Secale cereal, Triticum aestivum, Triticum durum,Capsicum sativus, Cucurbita pepo, Citrullus lanatus, Cucumis melo,Citrus aurantifolia, Citrus maxima, Citrus medica, Citrus reticulata.

More preferably the animal cell is of the genus Homo, Rattus, Mus, Sus,Bos, Danio, Canis, Felis, Equus, Salmo, Oncorhynchus, Gallus, Meleagris,Drosophila, Caenorhabditis; more preferably, the animal cell is of thespecies Homo sapiens, Rattus norvegicus, Mus musculus, Sus scrofa, Bostaurus, Danio rerio, Canis lupus, Felis catus, Equus caballus, Salmosalar, Oncorhynchus mykiss, Gallus gallus, Meleagris gallopavo,Drosophila melanogaster, Caenorhabditis elegans.

In the present invention, the cell can be a plant cell, a mammaliancell, a fish cell, an insect cell or cell lines derived from theseorganisms for in vitro cultures or primary cells taken directly fromliving tissue and established for in vitro culture. As non limitingexamples cell lines can be selected from the group consisting of CHO-K1cells; HEK293 cells; Caco2 cells; U2-OS cells; NIH 3T3 cells; NSO cells;SP2 cells; CHO-S cells; DG44 cells; K-562 cells, U-937 cells; MRC5cells; IMR90 cells; Jurkat cells; HepG2 cells; HeLa cells; HT-1080cells; HCT-116 cells; Hu-h7 cells; Huvec cells; Molt 4 cells.

All these cell lines can be modified by the method of the presentinvention to provide cell line models to produce, express, quantify,detect, study a gene or a protein of interest; these models can also beused to screen biologically active molecules of interest in research andproduction and various fields such as chemical, biofuels, therapeuticsand agronomy as non-limiting examples.

-   -   by “mutation” is intended the substitution, deletion, insertion        of one or more nucleotides/amino acids in a polynucleotide        (cDNA, gene) or a polypeptide sequence. Said mutation can affect        the coding sequence of a gene or its regulatory sequence. It may        also affect the structure of the genomic sequence or the        structure/stability of the encoded mRNA.    -   In the frame of the present invention, the expression        “cleavage-induced mutagenesis”, preferably Double-Strand Break        (DSB)-induced mutagenesis refers to a mutagenesis event        consecutive to an NHEJ event following an endonuclease-induced        cleavage, leading to insertion/deletion at the cleavage site of        an endonuclease.    -   By “gene” is meant the basic unit of heredity, consisting of a        segment of DNA arranged in a linear manner along a chromosome,        which codes for a specific protein or segment of protein. A gene        typically includes a promoter, a 5′ untranslated region, one or        more coding sequences (exons), optionally introns, a 3′        untranslated region. The gene may further comprise a terminator,        enhancers and/or silencers.    -   As used herein, the term “locus” is the specific physical        location of a nucleic acid sequence (e.g. of a gene) on a        chromosome. The term “locus” usually refers to the specific        physical location of a protein or chimeric protein's nucleic        acid target sequence on a chromosome. Such a locus can comprise        a target sequence that is recognized and/or cleaved by a protein        or a chimeric protein according to the invention. It is        understood that the locus of interest of the present invention        can not only qualify a nucleic acid sequence that exists in the        main body of genetic material (i.e. in a chromosome) of a cell        but also a portion of genetic material that can exist        independently to said main body of genetic material such as        plasmids, episomes, virus, transposons or in organelles such as        mitochondria or chloroplasts as non-limiting examples.    -   By “fusion protein” is intended the result of a well-known        process in the art consisting in the joining of two or more        genes which originally encode for separate proteins or part of        them, the translation of said “fusion gene” resulting in a        single polypeptide with functional properties derived from each        of the original proteins.    -   By “chimeric protein” according to the present invention is        meant any fusion protein comprising at least one RVD to bind a        nucleic acid sequence and one additional protein domain to        process a nucleic acid target sequence within or adjacent to        said bound nucleic acid sequence.    -   By “additional protein domain” or “protein domain” is meant the        nucleic acid target sequence processing part of said chimeric        protein according to the present invention. Said protein domain        can provide any catalytical activity as classified and named        according to the reaction they catalyze [Enzyme Commission        number (EC number) at        http://www.chem.qmul.ac.uk/iubmb/enzyme/)]. Said protein domain        can be a catalytically active entity by itself. Said protein        domain can be a protein subdomain that needs to interact with        another protein subdomain to form a dimeric protein domain        active entity.    -   By a “TALE-nuclease” (TALEN) is intended a fusion protein        consisting of a DNA-binding domain derived from a Transcription        Activator Like Effector (TALE) and one nuclease catalytic domain        to cleave a nucleic acid target sequence. Said TALE-nuclease is        a subclass of chimeric protein according to the present        invention.    -   by “variant(s)”, it is intended a RVD variant, a chimeric        protein variant, a DNA binding variant, a TALE-nuclease variant,        a polypeptide variant obtained by replacement of at least one        residue in the amino acid sequence of the parent molecule.    -   by “functional mutant” is intended a catalytically active mutant        of a protein or a protein domain; such mutant can have the same        activity compared to its parent protein or protein domain or        additional properties. This definition applies to chimeric        proteins or protein domains that constitute chimeric proteins        according to the present invention. Are also encompassed in the        scope of this definition “derivatives” of these proteins or        protein domains that comprise the entirety or part of these        proteins or protein domains fused to other proteic or chemical        parts such as tags, antibodies, polyethylene glycol as        non-limiting examples.    -   “identity” refers to sequence identity between two nucleic acid        molecules or polypeptides. Identity can be determined by        comparing a position in each sequence which may be aligned for        purposes of comparison. When a position in the compared sequence        is occupied by the same base, then the molecules are identical        at that position. A degree of similarity or identity between        nucleic acid or amino acid sequences is a function of the number        of identical or matching nucleotides at positions shared by the        nucleic acid sequences. Various alignment algorithms and/or        programs may be used to calculate the identity between two        sequences, including FASTA, or BLAST which are available as a        part of the GCG sequence analysis package (University of        Wisconsin, Madison, Wis.), and can be used with, e.g., default        setting.

The above written description of the invention provides a manner andprocess of making and using it such that any person skilled in this artis enabled to make and use the same, this enablement being provided inparticular for the subject matter of the appended claims, which make upa part of the original description.

As used above, the phrases “selected from the group consisting of,”“chosen from,” and the like include mixtures of the specified materials.

Where a numerical limit or range is stated herein, the endpoints areincluded. Also, all values and subranges within a numerical limit orrange are specifically included as if explicitly written out.

The above description is presented to enable a person skilled in the artto make and use the invention, and is provided in the context of aparticular application and its requirements. Various modifications tothe preferred embodiments will be readily apparent to those skilled inthe art, and the generic principles defined herein may be applied toother embodiments and applications without departing from the spirit andscope of the invention. Thus, this invention is not intended to belimited to the embodiments shown, but is to be accorded the widest scopeconsistent with the principles and features disclosed herein.

Having generally described this invention, a further understanding canbe obtained by reference to certain specific examples, which areprovided herein for purposes of illustration only, and are not intendedto be limiting unless otherwise specified.

EXAMPLES

A first characterization of the activity, in yeast, of libraries havingposition 12 and/or 13 randomized (based on a HD scaffold, SEQ ID NO: 19)was performed. The randomization was performed on the RVD in position 1and/or on the RVDs in position 1 and 2 according to the target.

Libraries on Position 12 and 13

Eight libraries (lib1 to 8) which contain only a subset of the possible20 natural amino acids and one library (lib9) containing the 20 possibleamino acids were first used. The randomization of positions 12 and 13was performed using degenerated oligonucleotides (Table 1; SEQ ID NO:26-39) and conventional Overlap Extension (OE) PCR techniques using a HDmono-RVD in a pAPG10 plasmid (SEQ ID NO: 40) as template.

TABLE 1 List of oligonucleotides (5′→3′) used to introduce diversityin positions 12 and 13 in libraries of a HD bloc. Oligo- SEQ Diversitynucleo- ID Mono- Di- Library tides Sequences 5′->3′ NO: RVD RVD A1cccagtcacgacgttgtaaaac 26 Lib 1 B1gtctccagcgcctgcttgccgcccHNSaYgctggcgatggccacctgctc 27 48 2304 Lib 2 B2gtctccagcgcctgcttgccgccaBNSaYgctggcgatggccacctgctc 28 48 2304 Lib 3 B3gtctccagcgcctgcttgccgcccHNaSYgctggcgatggccacctgctc 29 48 2304 Lib 4 B4gtctccagcgcctgcttgccgccHBNSaYgctggcgatggccacctgctc 30 144 20736 Lib 5 B5gtctccagcgcctgcttgccgccGWDMHagctggcgatggccacctgctc 31 36 1296 Lib 6 B6gtctccagcgcctgcttgccgccMHaMHagctggcgatggccacctgctc 32 36 1296 Lib 7 B7gtctccagcgcctgcttgccgccaSYcHNgctggcgatggccacctgctc 33 48 2304 Lib 8 B8gtctccagcgcctgcttgccgccMTYcHNgctggcgatggccacctgctc 34 48 2304 C1cacaggaaacagctatgaccatg 35 D1 ggcaagcaggcgctggagacgg 36 Lib 9 B9gtctccagcgcctgcttgccgccMNNMNNgctggcgatggccacctgctc 37 1024 A2cccagtcacgacgttgtaaaac 38 C2 cccggtaccgcatctcgagg 39

All DNA fragments used in the different steps were purified by gelextraction. In brief, for the smaller libraries (lib1-8) the 8 DNAfragment containing the randomized 6 base pairs are generated usingoligonucleotides A1 (SEQ ID NO: 26) combined with B1-138 (SEQ ID NO: 27to 34) and the complementary fragment was generated usingoligonucleotides C1 (SEQ ID NO: 35) combined with D1 (SEQ ID NO: 36).The assembly PCRs were performed using oligonucleotides A1 and C1. Toprepare the starting biotinylated RVD block library used for the arraysynthesis, the assembly PCR is amplified by PCR using primers A2 (SEQ IDNO: 38) and C2 (SEQ ID NO: 39). The PCR product is purified and digestedwith SfaNI. To prepare the RVD block library to be used in position 2,the assembly PCR is purified and digested with BbVI. The use of type IISrestriction enzyme allows creation of compatible overhang betweenblocks. For the fully randomized library, mono-RVDs were prepared asdescribed for smaller libraries except using oligonucleotide A2 (SEQ IDNO: 38) with B9 (SEQ ID NO: 37) and C2 (SEQ ID NO: 39) instead of C1(SEQ ID NO: 35) for the first PCR and the subsequent assembly PCR.

The final RVD arrays libraries containing 1 or 2 randomized blocks (SEQID NO: 41 to 58) were synthesized using a solid support method composedof consecutive restriction/ligation/washing steps as shown in FIG. 1 .In brief the first library block was immobilized on a solid supportthrough biotin/streptavidin interaction, the second library block isligated to the first and after SfaNI digestion, the remaining of thearray (i.e the RVD array out of RVD from library, SEQ ID NO: 59)pre-synthesized by the same method was ligated to the libraries. Due tothe choice of the synthesis conditions, it is expected to recover up to50% of mono-RVD libraries, the fraction of array not having a library isexpected to be neglectable. The RVD arrays libraries were first clonedin a shuttle pAPG10 plasmid. The plasmid was transformed in E. coli,colonies representing between 5 and 50% of the total library diversitywere scrapped from the petri dishes, and DNA recovered by standardminiprep techniques. The insert of interest is recovered by restriction(BbvI and SfaNI) followed gel extraction and cloning into a yeastexpression plasmids.

Cloning of the RVD Array Collection in the TAL Backbone

The amino acid sequences of the N-terminal, C-terminal domains and RVDSwere based on the AvrBs3 TAL (ref: GenBank: X16130.1, SEQ ID NO: 78).The TAL backbone used in these experiment (pCLS9944, SEQ ID NO: 60) wasderived from the previously described pCLS7183 (SEQ ID NO: 61). Thisbackbone, pCLS9944, contains an additional N-terminal NLS sequencefollowed by an HA tag compared to the original pCLS7183. The C-terminaland the N-terminal domains are separated by two BsmBI restriction sites.The RVD arrays libraries (SEQ ID NO: 41 to 58) were subcloned in thepCLS9944 using type IIs restriction enzymes BsmBI for the receivingplasmid and BbvI and SfaNI for the inserted RVD sequence, leading to thenine libraries. Colonies were scrapped and DNA recovered by standardminiprep techniques.

TALE-Nuclease Activities in Yeast

All the libraries (558 clones after yeast transformation) were screenedon a target set containing the 16 possible bases in position 1/2,allowing using the same target set for libraries having 1 or 2 RVDsrandomized. All the yeast target reporter plasmids containing theTALE-Nuclease DNA target collection sequences were constructed aspreviously described (International PCT Applications WO 2004/067736 andin (Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould,Chames et al. 2006; Smith, Grizot et al. 2006). The collections ofTALE-Nuclease were tested at 37° C. and 30° C. in our yeast SSA assaypreviously described (International PCT Applications WO 2004/067736 andin (Epinat, Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould,Chames et al. 2006; Smith, Grizot et al. 2006) as pseudo-palindromicsequences (two identical recognition sequences are placed facing eachother on both DNA strands) on their target collections (SEQ ID NO: 62 to77, Table 2).

TABLE 2 Target collections for libraries screening. SEQ Name SequenceID NO: AAG_RAGT2L10 TAAGCACTTATatgtgtgtaacaggt 62 ATAAGTGCTTAACG_RAGT2L10 TACGCACTTATatgtgtgtaacaggt 63 ATAAGTGCGTA AGG_RAGT2L10TAGGCACTTATatgtgtgtaacaggt 64 ATAAGTGCCTA ATG_RAGT2L10TATGCACTTATatgtgtgtaacaggt 65 ATAAGTGCATA CAG_RAGT2L10TCAGCACTTATatgtgtgtaacaggt 66 ATAAGTGCTGA CCG_RAGT2L10TCCGCACTTATatgtgtgtaacaggt 67 ATAAGTGCGGA CGG_RAGT2L10TCGGCACTTATatgtgtgtaacaggt 68 ATAAGTGCCGA CTG_RAGT2L10TCTGCACTTATatgtgtgtaacaggt 69 ATAAGTGCAGA GAG_RAGT2L10TGAGCACTTATatgtgtgtaacaggt 70 ATAAGTGCTCA GCG_RAGT2L10TGCGCACTTATatgtgtgtaacaggt 71 ATAAGTGCGCA GGG_RAGT2L10TGGGCACTTATatgtgtgtaacaggt 72 ATAAGTGCCCA GTG_RAGT2L10TGTGCACTTATatgtgtgtaacaggt 73 ATAAGTGCACA TAG_RAGT2L10TTAGCACTTATatgtgtgtaacaggt 74 ATAAGTGCTAA TCG_RAGT2L10TTCGCACTTATatgtgtgtaacaggt 75 ATAAGTGCGAA TGG_RAGT2L10TTGGCACTTATatgtgtgtaacaggt 76 ATAAGTGCCAA TTG_RAGT2L10TTTGCACTTATatgtgtgtaacaggt 77 ATAAGTGCAAA

TALE-Nuclease cleavage activity levels of individual clones of thelibrary on the complete collection of targets in yeast were recorded.Plasmid DNA of clones having activity on at least one target wasrecovered using standard yeast biology techniques, transformed in E.coli and plasmid DNA from individual colonies were recovered by standardmolecular biology techniques. The plasmid DNA were sequenced andretransformed in yeast for a secondary screen. Table 3 represents themean activity (screen 1 and 2) of three clones in which RVD 1 wasrandomized (SEQ ID NO: 23 to 25) recovered from a subset of thelibraries.

TABLE 3 Mean activities of three clones with one RVD randomized on aserie of targets (SEQ ID NO: 62-77) in our yeast SSA assay previouslydescribed (International PCT Applications WO 2004/067736 and in (Epinat,Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al.2006; Smith, Grizot et al. 2006) at 30° C. − indicates no detectableactivity, + indicates low activity, ++ medium activity and +++ highactivity. Targeted base A C G T Varia- Classical HD (SEQ ID NO: 19) ++++ + +++ ble NN (SEQ ID NO: 20) +++ +/− +++ − di- NG (SEQ ID NO: 21) +++++ +/− − residue NI (SEQ ID NO: 22) +++ +/− + − New TL (SEQ ID NO: 23)+++ − − − VT (SEQ ID NO: 24) +++ +/− +++ − SW (SEQ ID NO: 25) +/− − − −

Example 2

To design new RVD/target pairs (in the context of a TALE-nuclease) anextensive characterization of the activity in yeast of libraries havingposition 12 and/or 13 randomized was performed. The randomization wasperformed in NNK libraries on positions 12 and 13 of a repeat unitinserted at position 1 to 4 of the array of 9.5 repeat units.

The randomization of positions 12 and 13 was performed using degeneratedoligonucleotides (Table 4, SEQ ID NO: 79-84) and conventional OverlapExtension (OE) PCR techniques using a NG mono-repeat unit (SEQ ID NO:85) in a pAPG10 plasmid (SEQ ID NO: 86) as template. All DNA fragmentsused in the different steps were purified by appropriate techniques. Inbrief, the DNA fragment containing the randomized 6 base pairs aregenerated using oligonucleotide E1 (SEQ ID NO: 79) combined with E2 (SEQID NO: 80) leading to FRAG1 and the complementary fragment was generatedusing oligonucleotides F1 (SEQ ID NO: 81) combined with F2 (SEQ ID NO:82) leading to FRAG2. The assembly PCR of FRAG1 and FRAG2 was performedusing oligonucleotides G1 (SEQ ID NO: 83) and G2 (SEQ ID NO: 84) toallow biotinylation of the fragment. The PCR product are furtherpurified and digested with SfaNI.

TABLE 4 List of oligonucleotides (5′→3′) used tointroduce diversity in position 12 and 13 of a NG bloc. Oligo- nucleo-SEQ tide ID Names Sequences NO: Oligo E1 cccagtcacgacgttgtaaaac 79Oligo E2 gtctccagcgcctgcttgccgccMNNMNNgctggc 80 gatggccaccacctgctcOligo F1 ggcaagcaggcgctggagacgg 81 Oligo F2 cacaggaaacagctatgaccatg 82Oligo G1 Biotin-cccagtcacgacgttgtaaaac 83 Oligo G2 cccggtaccgcatctcgagg84Library a in Position 1 of the Array

For this collection in position 1 of the TALE array, the desiredbuilding block coding for TALE array A2-A10 (SEQ ID NO: 87) waspre-prepared (BbvI digested) and coupled (ligated) to the immobilizedbloc (randomized in positions 12 and 13) via a solid support technology(FIG. 2 ). The final product was recovered using enzymatic restriction(SfaNI and BbvI digestions) and cloned in a yeast pCLS9944 expressionplasmid (SEQ ID NO: 60). After transformation in E. coli, 1200 colonieswere individually picked, grown overnight and plasmid DNA extractedusing standard procedures.

Libraries B, C and D in Position 2, 3 and 4 of the Array

For these libraries in position 2, 3 and 4 of the TALE array, thedesired building blocks coding for RVD array B03-B10 (SEQ ID NO: 88) forlibrary B, C04-C10 (SEQ ID NO: 89) for library C and D05-D10 (SEQ ID NO:90) for library D were pre-prepared and coupled to the randomized blocvia a solid support technology and steps of enzymatic restrictions anddigestions. The coupled intermediate products were then subcloned in theshuttle pAGG10 plasmid. Colonies (at least 4 time the diversity of thelibraries) were scraped from the agarose plates, plasmid DNA wereextracted using standard techniques and the intermediate arrayconstructs containing the randomized bloc in position 1 were recoveredusing enzymatic restriction (BbvI and SfiI). These intermediate arrayconstructs containing the randomized bloc in position 1 were coupled(ligated) to immobilized blocs coding for, B01 (SEQ ID NO: 91) forlibrary B, C01-C002 (SEQ ID NO: 92) for library C and D01-D03 (SEQ IDNO: 93) for library D, via a solid support technology (FIG. 2 ). Thefinal products were recovered using enzymatic restriction (SfaNI andBbvI digestions) and cloned in a yeast expression plasmid pCLS9944 (SEQID NO: 60). After transformation in E. coli, 1200 colonies wereindividually picked, grown overnight and plasmid DNA extracted usingstandard procedures.

TALE-Nuclease Library Activities in Yeast

DNA plasmids coding for all members of the libraries, were individuallytransformed in yeast cells, leading to 1144, 1149, 1148 and 1150transformants for the library A, B, C and D respectively.

All the yeast target reporter plasmids containing the TALE-Nuclease DNAtarget collection sequences were constructed as previously described(International PCT Applications WO 2004/067736 and in (Epinat, Arnouldet al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006;Smith, Grizot et al. 2006). The libraries of TALE-Nuclease were testedat 37° C. and 30° C. in our yeast SSA assay previously described(International PCT Applications WO 2004/067736 and in (Epinat, Arnouldet al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al. 2006;Smith, Grizot et al. 2006) as pseudo-palindromic sequences (twoidentical recognition sequences are placed facing each other on both DNAstrands) on their respective targets (containing A, C, G or T at theposition of the library bloc, Table 5, SEQ ID NO: 94 to SEQ ID NO: 109).

TABLE 5 SEQ Target ID Names Target Sequences NO: Target positionTAGTTACTTATatgtgtgtaacaggt 94 1 A ATAAGTAACTA target positionTCGTTACTTATatgtgtgtaacaggt 95 1 C ATAAGTAACGA target positionTGGTTACTTATatgtgtgtaacaggt 96 1 G ATAAGTAACCA target positionTTGTTACTTATatgtgtgtaacaggt 97 1 T ATAAGTAACAA target positionTTAGCACTTATatgtgtgtaacaggt 98 2 A ATAAGTGCTAA target positionTTCGCACTTATatgtgtgtaacaggt 99 2 C ATAAGTGCGAA target positionTTGGCACTTATatgtgtgtaacaggt 100 2 G ATAAGTGCCAA target positionTTTGCACTTATatgtgtgtaacaggt  101 2 T ATAAGTGCAAA target positionTGGATACTTATatgtgtgtaacaggt  102 3 A ATAAGTATCCA target positionTGGCTACTTATatgtgtgtaacaggt  103 3 C ATAAGTAGCCA target positionTGGGTACTTATatgtgtgtaacaggt  104 3 G ATAAGTACCCA target positionTGGTTACTTATatgtgtgtaacaggt  105 3 T ATAAGTAACCA target positionTGGTAACTTATatgtgtgtaacaggt  106 4 A ATAAGTTACCA target positionTGGTCACTTATatgtgtgtaacaggt  107 4 C ATAAGTGACCA target positionTGGTGACTTATatgtgtgtaacaggt  108 4 G ATAAGTCACCA target positionTGGTTACTTATatgtgtgtaacaggt 109 4 T ATAAGTAACCA List ofpseudo-palindromic sequences targets (two identical recognitionsequences are placed facing each other on both DNA strands) in our yeastSSA assay previously described (International PCT Applications WO2004/067736 and in (Epinat, Arnould et al. 2003; Chames, Epinat et al.2005; Arnould, Chames et al. 2006; Smith, Grizot et al. 2006) at 30° C.,used for activity screens in yeast of libraries A, B, C and D.

TALE-Nuclease cleavage activities were recorded for all members of thelibraries and are summarized in FIGS. 3, 4, 5 and 6 for the libraries A,B, C and D respectively. DNA of 101, 105, 136 and 128 (for the libraryA, B, C and D respectively) clones was sequenced.

Insertion of Non-Natural RVDs in 15.5 Repeats Arrays and Activities inYeast

DNA coding for arrays containing non-natural RVDs in position 7 and 11of the arrays was synthesized and subcloned in a pAPG10 plasmid(GeneCust) (SEQ ID NO: 86) leading to array pCLS19101 (NM in position 7and LP in position 11) (SEQ ID NO: 110) and array pCLS19102 (SD inposition 7 and VG in position 11) (SEQ ID NO: 111). The repeatscontaining arrays were then subcloned in a yeast expression plasmidpCLS9944 (SEQ ID NO: 60) using BsmBI restriction enzyme and standardmolecular biology procedures leading to respectively half-TALE-NucleasepCLS20349 (SEQ ID NO: 112) and pCLS20350 (SEQ ID NO: 113). The pendantof these two half-TALE-Nuclease containing only the canonical 4 RVDs(NI, HD, NG and NN) as well as the second half-TALE-Nuclease allowingthe formation of an heterodimeric TALE-Nuclease were synthesized usingsolid support methods and subcloned in a yeast expression plasmidpCLS9944 (SEQ ID NO: 60) leading to respectively pCLS20735 (SEQ ID NO:114) and pCLS20736 (SEQ ID NO: 115).

All the yeast target reporter plasmids were constructed as previouslydescribed (International PCT Applications WO 2004/067736 and in (Epinat,Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al.2006; Smith, Grizot et al. 2006). The TALE-Nucleases were tested at 37°C. in our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006) as heterodimeric sequences (two different recognition sequencesare placed facing each other on both DNA strands) on 2 targets (A and B)varying at bases 7 and 11 (respective to the T₀) (Table 6, SEQ ID NO:116 to SEQ ID NO: 117).

TABLE 6 Target SEQ Names Target sequences ID NO: Target ATCTGACACAACTGTGTTcactagcaacctcaa 116 ACAGACACCATGGTGCA Target BTCTGACATAACAGTGTTcactagcaacctcaa 117 ACAGACACCATGGTGCA List ofheterodimeric sequences targets A and B varying at bases 7 and 11 (twodifferent recognition sequences are placed facing each other on both DNAstrands) in our yeast SSA assay previously described (International PCTApplications WO 2004/067736 and in (Epinat, Arnould et al. 2003; Chames,Epinat et al. 2005; Arnould, Chames et al. 2006; Smith, Grizot et al.2006) at 37° C., used for activity screens in yeast of NM/LP and SD/VGcontaining half-TALE-Nuclease

TALE-Nuclease cleavage activities were recorded for all three pairspCLS20349/pCLS20736; pCLS20350/pCLS20736 and pCLS20735/pCLS20736 (Table7). These results confirm that the news RVDs characterized in thepresent invention have a higher specificity than RVDs previouslydescribed (WO2011/146121).

TABLE 7 Activities of the three TALE-Nuclease pairs on heterodimericsequence target A and B (two identical recognition sequences are placedfacing each other on both DNA strands) in our yeast SSA assay previouslydescribed (International PCT Applications WO 2004/067736 and in (Epinat,Arnould et al. 2003; Chames, Epinat et al. 2005; Arnould, Chames et al.2006; Smith, Grizot et al. 2006) at 30° C. ++ indicates medium activityand +++ high activity. Target A Target B pCLS20349/pCLS20736 +++ ++pCLS20350/pCLS20736 +++ +++ pCLS20735/pCLS20736 +++ +++

LIST OF CITED REFERENCES

-   Arnould, S., P. Chames, et al. (2006). “Engineering of large numbers    of highly specific homing endonucleases that induce recombination on    novel DNA targets.” J Mol Biol 355(3): 443-58.-   Boch, J., H. Scholze, et al. (2009). “Breaking the code of DNA    binding specificity of TAL-type III effectors.” Science 326(5959):    1509-12.-   Bogdanove, A. J., S. Schornack, et al. (2010). “TAL effectors:    finding plant genes for disease and defense.” Curr Opin Plant Biol    13(4): 394-401.-   Cermak, T., E. L. Doyle, et al. (2011). “Efficient design and    assembly of custom TALEN and other TAL effector-based constructs for    DNA targeting.” Nucleic Acids Res 39(12): e82.-   Chames, P., J. C. Epinat, et al. (2005). “In vivo selection of    engineered homing endonucleases using double-strand break induced    homologous recombination.” Nucleic Acids Res 33(20): e178.-   Christian, M., T. Cermak, et al. (2010). “Targeting DNA    double-strand breaks with TAL effector nucleases.” Genetics 186(2):    757-61.-   Dayhoff, M. O., Schwartz, R. and Orcutt, B. C. (1978). “A model of    Evolutionary Change in Proteins”. Atlas of protein sequence and    structure (volume 5, supplement 3 ed.). Nat. Biomed. Res. Found. pp.    345-358-   Deng, D., C. Yan, et al. (2012). “Structural basis for    sequence-specific recognition of DNA by TAL effectors.” Science    335(6069): 720-3.-   Epinat, J. C., S. Arnould, et al. (2003). “A novel engineered    meganuclease induces homologous recombination in yeast and mammalian    cells.” Nucleic Acids Res 31(11): 2952-62.-   Geissler, R., H. Scholze, et al. (2011). “Transcriptional activators    of human genes with programmable DNA-specificity.” PLoS One 6(5):    e19509.-   Henikoff, S, and J. G. Henikoff (1992). “Amino acid substitution    matrices from protein blocks.” Proc Natl Acad Sci USA 89(22):    10915-9.-   Huang, P., A. Xiao, et al. (2011). “Heritable gene targeting in    zebrafish using customized TALENs.” Nat Biotechnol 29(8): 699-700.-   Li, L., M. J. Piatek, et al. (2012). “Rapid and highly efficient    construction of TALE-based transcriptional regulators and nucleases    for genome modification.” Plant Mol Biol 78(4-5): 407-16.-   Li, T., S. Huang, et al. (2011). “TAL nucleases (TALNs): hybrid    proteins composed of TAL effectors and FokI DNA-cleavage domain.”    Nucleic Acids Res 39(1): 359-72.-   Mahfouz, M. M., L. Li, et al. (2012). “Targeted transcriptional    repression using a chimeric TALE-SRDX repressor protein.” Plant Mol    Biol 78(3): 311-21.-   Mahfouz, M. M., L. Li, et al. (2011). “De novo-engineered    transcription activator-like effector (TALE) hybrid nuclease with    novel DNA binding specificity creates double-strand breaks.” Proc    Natl Acad Sci USA 108(6): 2623-8.-   Mak, A. N., P. Bradley, et al. (2012). “The crystal structure of TAL    effector PthXo1 bound to its DNA target.” Science 335(6069): 716-9.-   Miller, J. C., S. Tan, et al. (2011). “A TALE nuclease architecture    for efficient genome editing.” Nat Biotechnol 29(2): 143-8.-   Morbitzer, R., J. Elsaesser, et al. (2011). “Assembly of custom    TALE-type DNA binding domains by modular cloning.” Nucleic Acids Res    39(13): 5790-9.-   Moscou, M. J. and A. J. Bogdanove (2009). “A simple cipher governs    DNA recognition by TAL effectors.” Science 326(5959): 1501.-   Murakami, M. T. et al. The repeat domain of the type Ill effector    protein PthA shows a TPR-like structure and undergoes conformational    changes upon DNA interaction. Proteins 78, 3386-3395 (2010)-   Mussolino, C., R. Morbitzer, et al. (2011). “A novel TALE nuclease    scaffold enables high genome editing activity in combination with    low toxicity.” Nucleic Acids Res 39(21): 9283-93.-   Sander, J. D., L. Cade, et al. (2011). “Targeted gene disruption in    somatic zebrafish cells using engineered TALENs.” Nat Biotechnol    29(8): 697-8.-   Smith, J., S. Grizot, et al. (2006). “A combinatorial approach to    create artificial homing endonucleases cleaving chosen sequences.”    Nucleic Acids Res.-   Tesson, L., C. Usal, et al. (2011). “Knockout rats generated by    embryo microinjection of TALENs.” Nat Biotechnol 29(8): 695-6.-   Weber, E., R. Gruetzner, et al. (2011). “Assembly of designer TAL    effectors by Golden Gate cloning.” PLoS One 6(5): e19722.-   Yakubovskaya, E., E. Mejia, et al. (2010). “Helix unwinding and base    flipping enable human MTERF1 to terminate mitochondrial    transcription.” Cell 141(6): 982-93.-   Zhang, F., L. Cong, et al. (2011). “Efficient construction of    sequence-specific TAL effectors for modulating mammalian    transcription.” Nat Biotechnol 29(2): 149-53.

The invention claimed is:
 1. A method for modifying the genetic materialof a cell comprising: (a) selecting a nucleic acid target sequencepresent on a chromosome of a mammalian cell; (b) engineering a proteincomprising at least: (i) one Transcription Activator-Like Effector(TALE) domain wherein said TALE domain comprises a plurality of TALErepeat sequences comprising each one a Repeat Variable Diresidue region(RVD) which is responsible for the binding to one specific nucleotide inthe nucleic acid target sequence present on a chromosome of themammalian cell, wherein one or more RVD is selected from the groupconsisting of: PI, DL, FL, GL, IL, KL, LL, YL, MM, WY, PV, SW, XF forrecognizing A, wherein X represents one amino acid residue selected fromthe group consisting of A, G, V, L, I, M, S, T, C, P, D, E, F, Y, W, Q,N, and K; RE for recognizing C; ER, FR, GR, LR, QR, VR for recognizingG; and (ii) an endonuclease domain to cleave genetic material within thenucleic acid target sequence; (c) contacting said engineered proteinwith said nucleic acid target sequence in the mammalian cell such thatthe engineered protein binds to the nucleic acid target sequence andcleaves the chromosome within the nucleic acid target sequence to createa double strand break, wherein the double strand break is repaired bythe cell through non-homologous end joining (NHEJ) resulting in agenetic modification in the chromosome.
 2. A method for inducinghomologous gene targeting in a mammalian cell comprising: (a) providinga mammalian cell comprising a nucleic acid target sequence present on achromosome; (b) engineering a chimeric protein comprising at least: (i)one Transcription Activator-Like Effector (TALE) domain wherein saidTALE domain comprises a plurality of TALE repeat sequences comprisingeach one a Repeat Variable Diresidue region (RVD) which is responsiblefor the binding to one specific nucleotide in the nucleic acid targetsequence present on a chromosome of the mammalian cell, wherein one ormore RVD is selected from the group consisting of: PI, DL, FL, GL, IL,KL, LL, YL, MM, WY, PV, SW, XF for recognizing A, wherein X representsone amino acid residue selected from the group consisting of A, G, V, L,I, M, S, T, C, P, D, E, F, Y, W, Q, N, and K; RE for recognizing C; ER,FR, GR, LR, QR, VR for recognizing G; and (ii) an endonuclease domain tocleave genetic material within the nucleic acid target sequence; and (c)introducing said chimeric protein into said mammalian cell such that theengineered protein binds to the nucleic acid target sequence and cleavesthe chromosome within the nucleic acid target sequence to create adouble strand break, (d) introducing into the cell an exogenous nucleicacid comprising a sequence homologous to at least a portion of thenucleic acid target sequence, wherein homologous recombination occursbetween said exogenous nucleic acid and the nucleic acid target sequenceprocesses genetic material within or adjacent to the specific nucleicacid target sequence.
 3. A method for generating an animal comprising:(a) providing a cell comprising a nucleic acid target sequence intowhich it is desired to introduce a genetic modification; (b) modifyingthe genetic material of a cell according to the method of claim 1; (c)generating an animal from the cell or progeny thereof, in which agenetic modification has occurred.
 4. The method according to claim 2,wherein the cell is a human cell.
 5. The method of claim 2, wherein oneor more RVD is selected from the group consisting of: PI, DL, FL, GL,IL, KL, LL, YL, MM, WY, PV, SW, XF for recognizing A.
 6. The method ofclaim 2, wherein one or more RVD is selected from the group consistingof: RE for recognizing C.
 7. The method of claim 2, wherein one or moreRVD is selected from the group consisting of: ER, FR, GR, LR, QR, VR forrecognizing G.
 8. The method of claim 1, wherein one or more RVD isselected from the group consisting of: PI, DL FL, GL, IL, KL, LL, YL,MM, WY, PV, SW, XF for recognizing A.
 9. The method of claim 1, whereinone or more RVD is selected from the group consisting of: RE forrecognizing C.
 10. The method of claim 1, wherein one or more RVD isselected from the group consisting of: ER, FR, GR, LR, QR, VR forrecognizing G.
 11. The method of claim 1, wherein the geneticmodification is a deletion.
 12. The method of claim 1, wherein thegenetic modification is an insertion.
 13. The method of claim 1, whereinthe genetic modification is a combination of both a deletion and aninsertion.